Q1. Classification Model (40 points)¶
Accuracy: 0.9790136411332634
The above failure could be due to missing/mistaken stereotypical global features, e.g. chair should have horizontal board
Q2. Segmentation Model (40 points)¶
Accuracy: 0.9031894651539708
Best Segmentation Ground Truth (left) vs Prediction (right):
Sample Accuracy: 0.9967
Sample Accuracy: 0.9961
Sample Accuracy: 0.9939
Worst Segmentation Ground Truth (left) vs Prediction (right):
Sample Accuracy: 0.4625
Sample Accuracy: 0.4703
The first failure case might be due to our current network's receptive field of either single or all points, and the second must be due to geometrically mixed labels
Q3. Robustness Analysis (20 points)¶
Random Rotation¶
Apply different random rotation for each test sample
Classification¶
Accuracy: 0.3179433368310598
Pretty bad result. When not trained with random rotation, gravity becomes a negative cue for the task
Segmentation¶
Accuracy: 0.3485492706645057
Best Segmentation Ground Truth (left) vs Prediction (right):
Sample Accuracy: 0.9328
Sample Accuracy: 0.9076
Sample Accuracy: 0.8885
Worst Segmentation Ground Truth (left) vs Prediction (right):
Sample Accuracy: 0.0128
Sample Accuracy: 0.0514
Pretty bad result. Again, not trained with random rotation, then gravity becomes negative cue
Less number of points¶
Reduce to 100 per sample
Classification¶
Accuracy: 0.9307450157397692
Result is pretty good, because we have global max, as long as 1 point's deciding feature channel stands out, the class can be determined
Segmentation¶
Accuracy: 0.8055591572123176
Best Segmentation Ground Truth (left) vs Prediction (right):
Sample Accuracy: 1.0000
Sample Accuracy: 1.0000
Sample Accuracy: 1.0000
Worst Segmentation Ground Truth (left) vs Prediction (right):
Sample Accuracy: 0.2800
Sample Accuracy: 0.3200
Also not dropping too much, because the receptive field is either 1 single point or all the points, as long as 1/100 point's channel stands out in that channel for all the points, it can provide holistic understanding for the entire point cloud
Q4. Bonus Question - Locality (20 points)¶
Implemented Point Transformer, set nearest neighbor to 8, trained with randomly rotated 100 points
Segmentation did not adopt the UNet architecture, just plain 4 layers of transformer without down/upsampling
Classification¶
Accuracy: 0.8719832109129066
Result is pretty good compared to previous random rotation evaluation, because we now have local receptive field and allow the feature to propagate to nearby neighbors
Segmentation¶
Accuracy: 0.6960291734197731
Best Segmentation Ground Truth (left) vs Prediction (right):
Sample Accuracy: 1.0000
Sample Accuracy: 0.9900
Sample Accuracy: 0.9900
Worst Segmentation Ground Truth (left) vs Prediction (right):
Sample Accuracy: 0.0500
Sample Accuracy: 0.0900
Also pretty good compared to previous random rotation evaluation, because now we have local receptive fields